The impact of incomplete knowledge on the evaluation of protein function prediction: a structured-output learning perspective
نویسندگان
چکیده
MOTIVATION The automated functional annotation of biological macromolecules is a problem of computational assignment of biological concepts or ontological terms to genes and gene products. A number of methods have been developed to computationally annotate genes using standardized nomenclature such as Gene Ontology (GO). However, questions remain about the possibility for development of accurate methods that can integrate disparate molecular data as well as about an unbiased evaluation of these methods. One important concern is that experimental annotations of proteins are incomplete. This raises questions as to whether and to what degree currently available data can be reliably used to train computational models and estimate their performance accuracy. RESULTS We study the effect of incomplete experimental annotations on the reliability of performance evaluation in protein function prediction. Using the structured-output learning framework, we provide theoretical analyses and carry out simulations to characterize the effect of growing experimental annotations on the correctness and stability of performance estimates corresponding to different types of methods. We then analyze real biological data by simulating the prediction, evaluation and subsequent re-evaluation (after additional experimental annotations become available) of GO term predictions. Our results agree with previous observations that incomplete and accumulating experimental annotations have the potential to significantly impact accuracy assessments. We find that their influence reflects a complex interplay between the prediction algorithm, performance metric and underlying ontology. However, using the available experimental data and under realistic assumptions, our results also suggest that current large-scale evaluations are meaningful and almost surprisingly reliable. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
منابع مشابه
Protein Secondary Structure Prediction: a Literature Review with Focus on Machine Learning Approaches
DNA sequence, containing all genetic traits is not a functional entity. Instead, it transfers to protein sequences by transcription and translation processes. This protein sequence takes on a 3D structure later, which is a functional unit and can manage biological interactions using the information encoded in DNA. Every life process one can figure is undertaken by proteins with specific functio...
متن کاملFuzzy multi-criteria decision making method based on fuzzy structured element with incomplete weight information
The fuzzy structured element (FSE) theory is a very useful toolfor dealing with fuzzy multi-criteria decision making (MCDM)problems by transforming the criterion value vectors of eachalternative into the corresponding criterion function vectors. Inthis paper, some concepts related to function vectors are firstdefined, such as the inner product of two function vectors, thecosine of the included ...
متن کاملDesign and Psychometrics of an Assessment Tool for University Characteristics as a Learning Organization from the perspective of Educational Leaders
Introduction: Universities as learning organizations are places for transcendence, teaching, research and offering knowledge. The aim of this study was to design and assess psychometric properties of an assessment tool for university characteristics as a learning organization from the perspective of educational leaders. Methods: This mixed methods research was performed on faculty members of Te...
متن کاملEffects of Structured Input and Meaningful Output on EFL Learners' Acquisition of Nominal Clauses
The current second language (L2) instruction research has raised great motivation for the use of both processing instruction and meaningful output instruction tasks in L2 classrooms as the two focus-on-form (FonF) instructional tasks. The present study investigated the effect of structured input tasks (represented by referential and affective tasks) compared with meaningful output tasks (implem...
متن کاملThe Relative Effectiveness of Input and Output-oriented Tasks with Different Involvement Loads on the Receptive and Productive Vocabulary Knowledge of Iranian EFL Learners
Framed in a cognitive approach to task-supported L2 vocabulary learning, the present study used a pedagogical approach to investigate the relative effectiveness of tasks with different involvement loads on the vocabulary knowledge of Iranian EFL learners. The goal was to investigate the way that the construct of involvement load is related to the Input Hypothesis (Krashen, 1985) and the Output ...
متن کامل